Using Information Extraction to Aid the Discovery of Prediction Rules from Text

نویسندگان

  • Un Yong Nahm
  • Raymond J. Mooney
چکیده

ABSTRACT Text mining and Information Extra tion (IE) are both topi s of signi ant re ent interest. Text mining on erns applying data mining, a.k.a. knowledge dis overy from databases (KDD) te hniques to unstru tured text. Information extra tion (IE) is a form of shallow text understanding that lo ates spe i pie es of data in natural language do uments, transforming unstru tured text into a stru tured database. This paper des ribes a system alled Dis oTEX, that ombines IE and KDD methods to perform a text mining task, disovering predi tion rules from natural-language orpora. An initial version of Dis oTEX is onstru ted by integrating an IE module based on Rapier and a rule-learning module, Ripper. We present en ouraging results on applying these te hniques to a orpus of omputer job postings from an Internet newsgroup.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

روش جدید متن‌کاوی برای استخراج اطلاعات زمینه کاربر به‌منظور بهبود رتبه‌بندی نتایج موتور جستجو

Today, the importance of text processing and its usages is well known among researchers and students. The amount of textual, documental materials increase day by day. So we need useful ways to save them and retrieve information from these materials. For example, search engines such as Google, Yahoo, Bing and etc. need to read so many web documents and retrieve the most similar ones to the user ...

متن کامل

INDUCING VALUABLE RULES FROM IMBALANCED DATA: THE CASE OF AN IRANIAN BANK EXPORT LOANS

<span style="color: #000000; font-family: Tahoma, sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: -webkit-left; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; display: inline !important; float: none; ba...

متن کامل

INDUCING VALUABLE RULES FROM IMBALANCED DATA: THE CASE OF AN IRANIAN BANK EXPORT LOANS

<span style="color: #000000; font-family: Tahoma, sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: -webkit-left; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; display: inline !important; float: none; ba...

متن کامل

Knowledge Extraction from the Neural ‘Black Box’ in Ecological Monitoring

Phytoplankton biomass within the Saginaw Bay ecosystem (Lake Huron, Michigan, USA) was characterized as a function of select physical/chemical indicators. The complexity and variability of ecological systems typically make it difficult to model the influences of anthropogenic stressors and/or natural disturbances. Here, Artificial Neural Networks (ANNs) were developed to model chlorophyll a con...

متن کامل

Text Mining with Information Extraction

The popularity of the Web and the large number of documents available in electronic form has motivated the search for hidden knowledge in text collections. Consequently, there is growing research interest in the general topic of text mining. In this paper, we develop a text-mining system by integrating methods from Information Extraction (IE) and Data Mining (Knowledge Discovery from Databases ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000